Radegen Biotechnology LLC © 2023 by Fernando Andrade, M.S. is licensed under CC BY-NC-SA 4.0. To view a copy of this license, 
visit http: //creativecommons.org/licenses/by-nc-sa/4.0/ 


Radegen 
‘Biotechnology 


A Systematic Paradigm to Cloning with Synthetic DNA 


By Fernando Andrade, M.S., CEO and CSO at Radegen Biotechnology, Clint, TX. 


This document, for the first time introduces concepts around synthetic DNA assembly by a novel 
paradigm of modularity by manufacturability instead of by functional properties. The document sets 
the stage for the development of systemized methods for tiered DNA assembly that are rationally 
developed, though never described before and thus are protected by Creative Commons 
proprietization by disclosure to the public domain. This paradigm is a cornerstone to our business 
practices and represents a critical competitive advantage for Radegen Biotechnology that warrants 


protection as intellectual property. 
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Introduction: A Guide to gBlocks, Custom Genes and Cloning 

Custom Genes and gBlocks are a great product for use in cloning. The synthetic DNA from IDT can 
be used like many other starting material, such as genomic DNA or PCR products. It has the 
advantage of being able to be used straight from the tube and the researcher is not limited to 
sequences found in nature. Properly designing gBlocks will ensure that your DNA sequence contains 
all the features required for cloning your synthetic DNA into a vector. This guide will describe how to 
design a DNA sequence for synthetic DNA manufacturing, provide a brief overview of common 
cloning techniques, describe how to design a DNA sequence for use with a specific cloning method, 


and go over a few of the most common problems that users run into when using our products. 


IDT has two product lines that are used for developing genetic constructs and cloning. The base 
product offering is called a gBlock and the higher end product is termed as a Custom Gene. A gBlock 
is a double stranded linear DNA fragment that consists of only the desired sequence, is delivered 
desiccated and to a final yield of either 250 ng (125-250 bp), 500 ng (251-750 bp) or 1000 ng 
(751-3000 bp). A Custom Gene is double stranded circular DNA that consist of your desired 


sequence along with the vector backbone. Custom Genes are also delivered desiccated and to a final 
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yield of 4000 ng regardless of size. Custom Genes are verified by next generation sequencing and 
are guaranteed to consist of the correct sequence. gBlocks are of high quality but have not been 
clonally enriched, nor are they verified by sequencing. For gBlocks we use mass spec and capillary 
electrophoresis for quality control analysis. Mass spec of the pre-assembled oligonucleotides 
ensures that the correct template material is being used in the assembly. Capillary electrophoresis 
verifies that a fragment of the correct size was assembled and that an assembly preparation consists 
of a majority of desired product. IDT recommends that all constructs assembled using gBlocks be 


screened and sequenced before use in an experimental setting. 


Links to IDT product info page: 


gBlocks - 
https://www.idtdna.com/pages/products/genes-and-gene-fragments/sblocks-gene-fragments 
Custom Genes - 


https://www.idtdna.com/pages/products/genes-and-gene-fragments/custom-gene-synthesis 


DNA sequences for manufacturing synthetic DNA 

This guide will first review the basics of designing a sequence for synthetic DNA manufacturing. 
Manufacturing synthetic DNA Is a process that involves the assembly of many oligonucleotides made 
via an organic chemistry-based reaction. This is the synthetic component of synthetic DNA. These 
short oligos are then assembled together via a proprietary enzymatic process. For the purposes of 
time and the scope of this guide, this process will not be described here. Instead, a brief overview of 
the sequence characteristics that are problematic to this process will be described briefly. The 
purpose of this is to help our customers understand the limitation of synthetic DNA and identify 
workarounds to help assemble difficult constructs and help accomplish their research goals! There 


are 3 main complexity characteristics that are especially problematic. 


1) GC content — Our enzymatic assembly process uses enzymes like DNA polymerase to assemble 


and enrich DNA. Sequence characteristics that are problematic for DNA polymerase will be 
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problematic for synthetic DNA manufacturing. High GC content leads to the formation of stable 
secondary structures that inhibit polymerase processivity. Regions of high GC content also increase 
the melting temperature of the template DNA above the limits of in vitro PCR and inhibit the 


disassociation of the double strand into a single stranded template for amplification. 


2) Secondary structures — Again since our process is restricted with the same pitfalls as those for 
DNA polymerase, secondary structures to a high enough degree can be problematic. Secondary 
structures (for the purposes described here) are defined as a steric molecular conformation that is 
different from the native functional conformation of a molecule. Examples of secondary structures in 
DNA include hairpin loops and G-quadraplex. Hairpins form when two regions within a ssDNA 
fragment are complementary in sequence and anneal together. G-quadraplex structures form in 
regions rich in guanine and consist of nucleotides stacked in tetrads on a single strand. These 
secondary structures exist in nature and have a purpose, but for synthetic DNA manufacturing they 
pose a problem for two reasons; the processivity of DNA polymerase becomes inhibited when it 
reaches a secondary structure and secondary structures will inhibit the complementary annealing 


between two oligos. 


3) Tandem repeats — Repeat sequences are an issue because they may cause the assembly of a 
“short circuited” construct. A short-circuited construct occurs when two oligos that share repeat 
regions anneal together and stitch the DNA non-tangentially. Sequences with a high degree of 


repeats or with a repeat that is long enough will be problematic. 


IDT has developed a complexity screening tool that is available on the order entry page for its 
respective synthetic DNA product. This tool is designed to quickly scan a DNA sequence, identify 
complexity characteristics and determine the severity of complexities present. The total complexity 
profile is conveyed by a complexity score that is calculated based on the number and degree of 
problematic features. This score serves as a benchmark for determining the manufacturability of a 
specific DNA sequence. For the gBlock product line a score under 10 indicates that the sequence 


will be accepted for manufacturing and anything 10 and above will automatically be rejected. For our 


A of 22 


Radegen Biotechnology LLC © 2023 by Fernando Andrade, M.S. is licensed under CC BY-NC-SA 4.0. To view a copy of this license, 
visit http: //creativecommons.org/licenses/by-nc-sa/4.0/ 


Custom Gene product line, a score below 10 automatically accepts the sequence for production. A 
score between 10-39.9 allows the sequence to be ordered but production will not start until the lab 
has an opportunity to review and accept the sequence; scores 40 and above are automatically 
rejected. Manufacturability based on this score allows IDT to provide the best product possible with 
the shortest turnaround times at an affordable cost. Aside from complexity characteristics, an 
additional aspect important for consideration is sequence length. The gBlock product line is cut off 
at 3kb, and the reason for this has to do with the innate error rate associated with synthetic DNA 


manufacturing. 


Every enzymatic process has an innate rate of error. The in vivo mutation rate during DNA replication 
has been calculated by several researchers and estimates range from 1 mutation every 100,000 — 
10,000,000,000 bp. The error rate during a PCR reaction is more well defined and depends on the 
polymerase being used. According to NEB, Taq polymerase has a rate of error of 1 in every 3,300 bp 
while Q5 high fidelity polymerase has an error rate of 1 in every 1,000,000 bp (X). The 
manufacturing process for synthetic DNA has an error rate that is multi-factorial and not associated 
with the error rate of one specific reaction. Instead, synthetic DNA inherits the error rate of multiple 
processes from oligo manufacturing to the DNA enrichment step during final fragment assembly. 
The cumulative error rate for synthetic DNA manufacturing is 1 in every 5,000-10,000 bp. This error 
rate becomes more problematic with larger molecules because there is a higher chance of an error 
being present on a single molecule, decreasing the total molecules with the correct sequence Ina 
gBlock preparation (Figure 1). This information is especially important when building large 
constructs. For building a 9kb construct using gBlocks, one could hypothetically assemble this using 
three 3kb gBlocks. Ideally a researcher would first clone each gBlock into a shuttle or intermediary 
vector, followed by screening and correct sequence identification. After the correct sequence has 
been identified the researcher would then proceed to excise the insert from the shuttle vector either 
via restriction digest or by PCR amplifying the insert out of the vector. This material can then serve as 
high fidelity DNA for assembling the 9kb construct from the three cloned gBlock fragments. These 
type of complex assembly reactions require that a researcher plan for subsequent assembly 


reactions by including all necessary features like, restriction sites, overlap regions and recombinase 
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recognition sites into the DNA sequence intended for synthetic DNA manufacturing. Custom Genes 
also serve as high fidelity DNA material for building larger constructs with reduced workloads. With 
Custom Genes we do all the cloning, screening and verification work for you! Having a good 
understanding of the different cloning methods that can be used with synthetic DNA will help you 


better understand the features that you should include in your DNA sequence for manufacturing. 


Overview: Common Cloning Methods 

The basic features of a cloning system consist of 3 elements, a DNA element with sequence features 
that allow for replication and selection, a simple single-celled model organism that allows for easy 
manipulation of the DNA element in vivo and a set of molecular tools for modifying and generating 
DNA. Microbiologist were the revolutionary driving force that, identified the molecular tools 
necessary for cloning and realized that creating an engineered DNA sequence was possible. Studies 
attempting to identify the molecular mechanisms that mediate bacteriophage infection and 
exclusion found that bacteria produce enzymes that cleave DNA with specific substrate specificity 
related to the DNA sequence. Microbiologist also characterized the properties of plasmid DNA and 
the genetic elements that coerce bacteria to maintain and propagate the extrachromosomal DNA. 
The organismal background for performing these genetic manipulations was defaulted to E. coli 


since this species of bacteria was already a central experimental model for understanding life. 


DNA cloning was first described by S.N. Cohen and colleagues in 1973. Their findings seem simple 
by today’s standards but have proven to have a tremendous impact on society. Early descriptions of 
cloning simply describe the ability to cleave DNA in vitro, ligate the DNA into a plasmid and 
propagate the DNA in Escherichia coli. The first restriction enzymes (RE) used for constructing a 
novel vector (recombinant vector) were EcoRI and EcoRII. RE digested DNA was introduced into 
competent cells and ligation of the vector was accomplished in vivo. Soon after the same group of 
researchers discovered that genes from different species of bacteria can be expressed in E. coli and 
that even genes derived from a mammalian source can be expressed in bacteria. Since then, there 
have been several improvements in the same basic technology used in the 1970s and new 


techniques have been developed that led to the construction of the first artificial chromosome. 
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The general workflow of a cloning reaction goes as follows. First, a recombinant plasmid is 
generated using the aforementioned molecular tools. Next, the recombinant plasmid is introduced 
to E. coli cells that have been chemically permeated. Once the cells take up the recombineered 
vector, the genetic elements harbored by the vector allow for the vector to propagate and for only 
cells that harbor the vector to survive. The selection markers that are most commonly used to 
ensure that only the recombinant cells survive are anti-biotic resistance genes. Cells that are 
transformed by the recombinant vector are grown in medium containing the anti-biotic chemical that 
corresponds to the selection marker. Only cells that successfully took up the recombinant plasmid 
will have the gene to grow under the lethal growth conditions. Solid agar growth medium is what is 
typically used for growing and selecting recombineered cells. Cells are spread on the agar medium 
at concentrations low enough were individual colonies will grow. Spreading cells at high 
concentrations will lead to an agar plate with a lawn of bacterial cells. Even though all cells within 
the E. coli lawn are technically recombineered, the diversity of recombieneered molecules is high 
enough were consistent results from DNA obtained from these lawns is not achievable for practical 
and much less experimental purposes. The concept of the colony forming unit is of central 
importance in molecular biology and microbiology. The concept applied dictates that a colony 
formed on a solid growth medium is a result of one pioneering and viable cell. For molecular biology 
applications it is assumed (and has been shown to be empirically true) that during transformation 
with recombinant DNA, a cell only takes up one vector molecule (X). When a cellular population is 
transformed and plated on a solid growth medium for selection, each colony theoretically represents 
a genetically clonal population. Plating dilute bacterial cultures to form colony forming units is a very 


powerful method for generating a clonal population of DNA with highly limited molecular diversity. 


Restriction enzyme-based cloning was the first and is the most commonly used method. Major 
improvements by companies like NEB have increased efficiency, fidelity and improved reaction 
conditions for RE DNA digestion. The general workflow for this technique goes as follows. A gene of 
interest is identified and isolated by PCR amplification. Restriction enzyme recognition sites are 


added to the sequence during the PCR reaction by adding the recognition sequence to the 5’ end of 
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the primers. Unique RE sites are added to the sense and anti-sense primers to ensure that the DNA 
is inserted into it’s target vector in the desired orientation. The modified DNA sequence, the cloning 
amplicon, is then digested with it’s respective enzymes, leaving unique overhangs with a sequence 
identity belonging to the RE recognition site. A second fragment of DNA (in this case the cloning 
vector) containing the same RE sites is digested leaving complementary sticky ends. These ends 
will anneal together with the sequence of interest with the intended direction. Today, introducing 
digested DNA fragments into E. coli competent cells is not the preferred method for ligating DNA. 
Instead the enzymes responsible for forming covalent bonds between cleaved DNA (aka “ligating 
DNA”) are well characterized and have been evolved to have high rates of efficiency in vitro. The two 
DNA fragments are permanently “stitched” together in vitro with a ligation enzyme, like T4 ligase, 
that forms covalent bonds between the 3’ hydroxyl ends of one nucleotide and the 5’ phosphate of 
the other. Some REs cleave DNA leaving blunt ends. This method requires that both the insert and 
the vector be cleaved with blunt ended REs. The disadvantage to blunt end cloning is that the 
directionality of the insert in respect to features on the vector DNA cannot be directed. Blunt end 
cloning along with other simple cloning methods like TA cloning are a great choice for cloning 
reactions meant to identify and sequence cloned DNA for use in more complex and larger fragment 


construction. 


Blunt end and TA cloning provides researchers with strategies for easily and quickly identifying a 
desired DNA fragment. These simple approaches to cloning are fast and provide an easy way of 
generating a recombinant vector for the purposes of screening and sequencing. There are several TA 
cloning kits that rely on the ability of Taq polymerase to add an adenosine nucleotide to the 3’ end of 
an amplicon. These kits provide a linearized vector with terminal ends modified with a single 3’ 
thymidine overhang. The recombinant vector is generated by targeting the insert amplicon to the 
vector by A-T complementation. The insert DNA is covalently bonded to the vector via an enzymatic 


ligation reaction. 


Blunt end cloning is a method of generating a recombinant vector by ligating DNA that does not 


contain sticky terminal ends. For this approach, PCR amplicons are generated that have blunt 
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terminal ends. A linearized vector is generated by digesting the vector with a blunt end RE cutter or 
by PCR amplification. If PCR amplification is the preferred choice for generating the linearized vector 
and amplicon DNA then a phosphorylation step will need to take place to ensure that the 
appropriate reactive groups are present to form a new phosphodiester bond. Phosphorylation can be 
accomplished after the PCR products have been generated by treating either the amplicon or vector 
DNA with a kinase like T4 Polynucletide Kinase. A phosphorylated vector backbone or amplicon can 
also be produced by using PCR primers with a 5’ phosphorylation modification already present. 
Aside from the inability to direct the orientation of the amplicon insert, Blunt end cloning also suffer 
from inefficient reaction rates that occur when an empty vector is circularized by ligation. One 
approach to help solve for a re-ligating vector is to ensure that the 5’ terminal strands of the vector 
are not phosphorylated and that only the amplicon has the phosphate groups required for a 
successful ligation reaction. NEB offers an ingenious solution for Blunt end cloning with their PCR 
Cloning Kit. This system relies on a vector that harbors a mini gene located in the multiple cloning 
site of the vector and codes for a toxic peptide. When DNA is ligated into the MCS, the mini gene ORF 
is disrupted and production of the toxic peptide is prevented. Cells that are transformed with the 
recombinant plasmid remain viable and each colony that grows contains a plasmid with an insert. 
When a plasmid re-ligates on itself, producing an empty vector, the toxic minigene ORF is 
maintained, the toxic peptide is produced and cells that are transformed with the empty vector do 


not grow. 


Golden Gate Cloning was first described in 1996 by (xx) and (xx) as a method for using type IIS 
restriction enzymes to assemble multiple fragments and clone into a vector in one ligation reaction. 
Type IIS restriction sites differ from traditional type II restriction enzymes in that cleavage of DNA 
occurs outside of the restriction enzyme recognition site. The advantage of using type IIS enzymes is 
that the overhangs produced after cleavage have a sequence identity that is independent of the 
recognition site. Overhangs have a sequence identity that is unique and native to the sequence of 
interest. This means that multiple DNA fragments can be modified with the same type IIS 
recognition site and unique overhangs will result at each cleavage site. Since RE recognition sites 


cleave outside of a non-palindromic recognition sequence, directionality of the Type IIS recognition 
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sequence is paramount. The three most commonly used type IIS enzymes leave 3’ overhangs that 
are 4 bp in length. Cleavage of the sense strand occurs either 1 or 2 bp from the end of the 
recognition site; cleavage of the anti-sense strand occurs after either 5 or 6 bp. An enzyme that 
cleave the sense strand 1bp after the recognition site will cleave the anti-sense strand after the 5th 
bp. An enzyme that cleaves the sense strand 2bp after the recognition site, will cleave the anti-sense 
strand after the 6th bp. This information is key when designing fragments meant for assembly. Not 
only does a fragment need the RE recognition site at the terminal ends of the amplicon but there 
also needs to be a 4 bp overlap with a sequence identity that corresponds to the future adjacent 
fragment. Assembled fragments contain the native sequence at the assembly seams, as opposed to 
the residual sites left with Type II palindromic RE sites. This process produces a scar-less construct, 
a major advantage to this cloning technique. The primary limitation to Golden Gate assembly is that 
the insert sequence and vector sequence must be devoid of the Type IIS site being employed. The 
appearance of a natural type IIs RE site in an amplicon or vector requires that site directed 
mutagenesis be employed to introduce a silent mutation to remove this site. A secondary limitation 
is that there are only 256 four base pair combinations possible; sticky ends with only a 1 base pair 


difference may ligate unintentionally. 


Gibson Assembly is a cloning technique that also relies on the generation of unique sticky ends that 
have a sequence identity that correspond to the gene of interest and with complementation to the 
sequence of a future adjacent fragment. Gibson Assembly is the commercially proprietary name for 
this method but the process is also known as iso-thermal assembly. The process was first described 
by the Venter group and was used to construct the first artificial chromosome. The process relies on 
a 5’ exonuclease that cleaves the 5’ strand on a DNA fragment producing 3’ overhangs. The general 
workflow for cloning a single amplicon in a vector goes as follows. A cloning amplicon is generated 
that has overlap sequence on the terminal ends that correspond to the plasmid sequence meant to 
flank the insert. The overlap region is typically designed to be 15-40 bp in length and to conform to 
specific GC content to ensure that the melting temperature of the overlap region is conductive to 
promote annealing between complimentary fragments while conforming to the reaction 


temperature of the enzymes being used. Linearized vector, the dsDNA fragment of interest, a 5’ 
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exonuclease, a DNA polymerase and a ligase enzyme are added to the reaction. Exonuclease activity 
cleaves the 5’ strand on the DNA fragments, annealing occurs between complementary regions, 
DNA polymerase fills in the gaps and the ligase enzyme forms covalent bonds between the 
fragments. This method Is especially useful for assembling large constructs and has high rates of 
efficiency. Iso-thermal assembly suffers from the same pitfalls associate with synthetic DNA 
manufacturing. Regions with extremes in GC content, secondary structures and repeat regions 


become problematic to assemble using this method. 


Gateway Cloning is a proprietary method for shuffling a gene of interest between various vector 
backbones by homologous recombination. This method differs greatly from the other methods 
described here in that a novel construct is not build by ‘cutting and pasting DNA’ but instead moved 
from one place to another using cis-acting elements on the vector that act with trans-acting factors 
that mediate a homologous recombination event to occur. For this method, a novel construct Is 
generated that contains recombinase recognition sites flanking the gene of interest. These 
recombinase recognition sites recruit a lambda phage derived recombinase that then targets the 
insert sequence to recombine with corresponding recombinase recognition sites on a destination 


vector. Please see ThermoFisher’s product literature for more information about this method. 


Sequence design for specific cloning methods 

The advantage of using synthetic DNA for molecular biology applications is that experimental design 
and validation can be done in silico. There is a plethora of software applications for in silico 
experimental design but none are specifically geared to designing a cloning reaction using synthetic 
DNA fragments. The available software assumes that a gene of interest originates from a natural 
source and generates experimental protocols that provide details about how to retrieve the gene of 
interest from the natural source using PCR. Current software solutions will provide primer 
sequences that contain the appropriate terminal modification for digesting an amplicon with the 
appropriate restriction enzymes or develop a solution for assembling multiple fragments in a Gibson 
assembly reaction. The ideal in silico solution for synthetic DNA would not provide a method for 


generating the DNA fragments for a cloning reaction but instead the output would be the final 
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sequence required for performing the reaction and ordering from IDT. Free DNA sequence 
manipulation software can be used to help visualize sequence design. Most free software allows a 
researcher to annotate a sequence, generate vector maps, provides basic information like GC 
content, melting temperatures and RE site cleavage positions. Many of the paid versions (and some 
open source freeware) have the same basic functionality but also provides in silico modeling 


software that allows a researcher to computationally validate an enzymatic reaction like PCR. 


The following design recommendations are for both gBlocks and Custom Genes. It is important for a 
researcher to know how each product is delivered. If a researcher is assembling a large construct 
using many gBlocks, it is also recommended that the individual gBlocks be cloned and sequenced 
before being used for the larger assembly. This approach will require that a researcher devise a plan 
for; cloning the gBlock into an intermediary vector, excising the insert, and cloning into the 
destination vector. Similarly, a researcher can also order fragments for assembling a larger construct 
as a Custom Gene. In this case, the researcher will need to develop a plan for excising the insert 


from the vector, assembling the larger construct and cloning the construct into a destination vector. 


Restriction enzyme-based cloning, blunt end cloning and TA cloning are rudimentary methods best 
suited for cloning one insert into a vector and not meant for assembling large constructs using 
multiple fragments. Generating a synthetic DNA fragment for these methods Is straight forward. A 
synthetic DNA fragment meant for RE cloning should include the gene of interest flanked by the RE 
recognition sites and a spacer sequence at the terminal ends (figure 4A). The spacer sequence at the 
terminal ends is meant to facilitate recruitment of the restriction enzyme to the DNA molecule for 
cleavage and increases the reaction rate. For blunt end cloning, the synthetic DNA fragment should 
only include the GOI sequence. IDT provides 5’ phosphorylation modification that is recommended 
for blunt end cloning (figure 4B). Ordering synthetic DNA for TA cloning requires that the research 
order the GOI sequence without any modifications. The researcher will need to adenylate the blunt 
ends of the DNA fragment (figure 4C). This process is referred to as “a-tailing” and can be 
accomplished by treating a blunt ended dsDNA fragment with Tap polymerase using the NEB 
protocol found here: (https://www.neb.com/protocols/2013/11/01/a-tailing-with-taq-polymerase). 
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Golden Gate cloning is a robust restriction enzyme-based method for assembling large constructs 
from multiple fragments into a vector. NEB and other companies have developed proprietary kits 
and vectors to facilitate this process. This method does not necessarily require these kits or vectors 
to work well but they are recommended since all the components are designed to work together and 
benchmarked as being high performance. There are two basic cloning reactions that can be 
accomplished using this method, a single insert can be cloned into a vector or multiple fragments 
can be assembled into a vector. These two examples will be provided here and it will be assumed 
that a vector with a multiple cloning site that contains a Type IIs recognition site is being used 
(figure 5a). Cloning one fragment into a vector requires that the researcher include the sequence of 
interest flanked by four base pairs that overlap with the vector sequence meant to sit next to the 
insert. The RE recognition site is then added to the terminal ends of the sequence with the 5’ start of 
the recognitions site oriented distally from the GOI. It is important to note that there is a spacer 
sequence between the end of the RE recognition site and the overlap region. For restriction enzymes 
that cut the sense strand one base pair after the end of the recognition site, a one base pair spacer 
nucleotide needs to be included after the end of the recognition sequence and before the beginning 
of the overlap sequence. For enzymes that cut the sense strand two base pairs after the end of the 
recognition site, two base pair spacer nucleotides need to be included after the end of the 
recognition sequence and before the beginning of the overlap sequence. It is also recommended 
that a spacer sequence be added to the terminal ends flanking the RE site to facilitate the 
recruitment of the enzyme to the DNA template and increase the reaction rate. A helpful method for 
designing a synthetic DNA fragment is to first design the final construct “on paper”, i.e. in silico. 
Figure z.b depicts a regional map of a recombineered vector focused on the insert sequence flanked 
by the plasmid sequence. The region is annotated and depicted by the orange bar labeled “sequence 
of interest” and the vector sequence is depicted by the blue bar labeled “plasmid” (figure z.c). The 
core of the gBlocks sequence of interest, starting on the 5’ end includes four base pairs of vector 
sequence, followed by the sequence of interest and four base pairs of overlap from the 3’ vector 
sequence flanking the sequence of interest. Figure z.c shows the gBlock sequence of interest 


isolated from the vector. Figure z.d shows the core gBlock flanked by the RE recognition sites anda 
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spacer sequence on the terminal ends. This sequence represents the final gBlock fragment that 
needs to be ordered from IDT. This dsDNA fragment includes all the appropriate sequence elements 
to digest the fragment, and target to the vector via complementary sticky ends. Figure z.1 shows a 
minimal multiple cloning site meant to linearize the vector and produce 4 bp sticky ends on the 5’ 


end and 3’ end of the linearized vector. 


Assembling multiple fragments is sometimes a necessity for various reasons. From a synthetic DNA 
perspective, there are times when the manufacturing of large or complex fragments is not a practical 
proposition for manufacturers due to technical limitations or to consumers due to cost. In these 
cases, it is often helpful to break these sequences down into smaller fragments that independently 
lack the complexity characteristics. Golden Gate assembly is a good choice for assembling larger 
constructs that contain complexity characteristics but who’s sequences can be broken down into 
smaller fragments that individually lack these characteristics. Golden Gate can be especially 
amenable to assembling fragments that contain regions with tandem repeats because fragments are 
targeted to each other based on complementary annealing of short sticky end arms. Gibson 
assembly lacks in this regard since tandem repeats are one of the limiting factors for this method. 
This method uses larger complementary regions to target fragments for assembly. These larger 
sticky ends have a higher chance of occurring in a region containing a repeat sequence, surfacing the 
potential of a short-circuited construct. Identifying sub-fragments with reduced complexities is the 
first step in the process. An educated trial and error approach are the best method to identify these 
sub fragments. The first step is obtaining a general overview of the complexities present in the 
desired final construct. This is best accomplished by using IDT’s complexity screening tool found on 
the IDT product info page. Once the complexity characteristics are identified, the sequence can then 
be broken down into sections that either lack or contain minimal complexity characteristics. The 
second step is to break down the sequence into equal portions of different sizes, and then screen 
the fragments for manufacturability using the complexity screener. A researcher should break the 
sequence down into as few fragments as possible. The total number of fragments will depend on the 
length of the overall construct and the manufacturability of each set of sub-fragments. Golden Gate 


Assembly has been benchmarked to efficiently assemble up to 24 fragments (please see NEB 
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product literature for recommendations). Based on anecdotal experience here at IDT, most cases 
will be resolved by breaking the larger construct into equal sub-fragments. More difficult cases will 
require that the researcher increase the length of one sub-fragment while decreasing the length of 
the other to dilute complexities found in each individual fragment. Figure a.1 shows a sequence with 
a repeat feature on the 5’ and 3’ ends. This sequence has a non-passing complexity score but when 
broken down, the repeat regions get segregated to two different sub-fragments with a complexity 
score that passes the manufacturability test. Figure a.2 shows a regional map of a recombineered 
vector with the section representing the plasmid sequence annotated with a blue bar and the insert 
sequence annotated with an orange bar. Figure a.3 depicts the two sub-fragments annotated in 
distinct shades of green. The sub-fragment on the 5’ end has a 4 bp overlap region corresponding to 
the adjacent vector sequence and 4 bp of overlap sequence corresponding to the second adjacent 
sub-fragment on the 3’ end. The second sub-fragment has a 4bp overlap region on the 3’ end 
corresponding to the adjacent vector sequence. Figure a.4 shows the RE site modified final 


sub-fragment sequence that should be ordered from IDT. 


Gibson Assembly is an excellent method for assembling and cloning large constructs. Using in silico 
modeling and visualization software is especially important here since these type of software 
solutions provide information like GC content and melting temperatures for selected regions. NEB’s 
product literature for Gibson assembly only has two prerequisites for fragment design; 1) the 
fragments should share a 15-40 bp overlap region and 2) the melting temperature for this region 
should be equal to or greater than 48 C. Gibson assembly is most amenable to assembling large 
construct from multiple sub-fragments that do not have tandem repeats that may result in short 
circuited constructs. The method Is also susceptible to high degrees of secondary structures and 
extremes in GC content. Please see NEB’s product literature for more information. The example 
provided below depicts the process of assembling and cloning a construct from two sub-fragment 
gBlocks. Figure b.1 depicts a regional map of a recombineered vector centered over an insert 
sequence of interest annotated by the orange bar, flanked by plasmid sequence annotated by the 


blue bar. Figure b.2 depicts two sub-fragments within the core sequence of interest that have 
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overlap regions that conform to the design specification for Gibson assembly. Figure b.3 depicts the 


final gBlock sequence that should be ordered by IDT. 


Gateway cloning is a proprietary system that has highly protected intellectual property and 
sequence information for the recombinase recognition sites can not be provided here. gBlock 
fragments for use with this method require that the core gBlock sequence of interest be flanked by 
the appropriate recombinase recognition sites (figure C). It is critical to note that this method can 
not be used to assemble a construct that consists of a single expression unit. Sub-fragments that 
contain a continuous coding region or regulatory elements will have interrupted continuity by adding 
the recognition sites at the terminal ends of the fragments. Please see ThermoFisher product 


literature for more info about this method. 


Resolving complexity characteristics can be challenging since this requires a modification of the DNA 
sequence, altering the coded information. There are two approaches that can be readily used to 
solve for a subset of complexities. Complexities located in a coding region and sequences with GC 
content issues at the terminal ends can be easily addressed. Extremes in GC content in the terminal 
ends of a sequence are problematic for synthetic DNA manufacturing since these regions are 
employed as primer binding sites and used during the manufacturing process. If a DNA sequence 
does not pass the complexity screener due to GC content on the terminal ends, then the addition of 
an adapter sequence usually mitigates this issue. Table 4 shows several adapter sequences of 
different sizes. These sequences are neutral in terms of GC content and do not contain any 
additional complexities. When adding a terminal adapter, it is important to also develop a strategy to 
remove the terminal adapters from the dsDNA for cloning. If an RE based approach is being 
employed, the terminal adapter sequence should be removed by digesting the fragment with the 
respective enzyme. If an iso-thermal assembly approach is being employed, then the terminal 
adapter sequence will need to be removed in a manner that leaves a scar-less digested product. 
Adding a blunt cutter palindromic type II restriction site is not the best option in this case because 
the sticky end arms that are generated will have three base pairs of sequence that correspond to the 


recognition site and will cause a mismatch with the target sequence. Figure x depicts a sequence of 
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interest with high regional GC content on the terminal ends of the construct. The complexity score 
for this sequence Is xyz and does not pass the manufacturability benchmark. After the addition of 
terminal adapters the complexity score for this sequence is within passing range. Figure X.2 depicts 
the sequence of interest modified with complexity neutral terminal adapters and MlyI Type IIs RE 
sites. MlyI is a Type IIs RE that produces a scarless blunt end clevage event. After RE digestion the 
dsDNA of interest is digested with a 5’ exonuclease to produce the long 3’ overhangs required for 


iso-thermal assembly. 


Codon optimization is the best approach for mitigating complexity features found within a coding 
region. IDT has a codon optimization tool found on the IDT website that is designed to generate a 
coding sequence with alternate codons. This tool is specifically focused on generating a DNA 
sequence devoid of complexity features that would cause a sequence to be rejected for 
manufacturing. It is critical to note that this tool is not intended for generating a sequence with 
optimized expression characteristics. Protein expression is a complex process that is influenced by 
many factors and there is not one proven in silico method for optimizing a coding sequence for 
expression. Gene expression optimization is an empirical process that requires a researcher to try a 
combination of different variables such a sequence design, cell line considerations and cell growth 
conditions to validate that a combination of conditions produces the best expressed genetic 
construct. The CodonOpt tool generates a randomized sequence that maintains the same coding 
potential as the native sequence while lacking the native complexity profile. The randomized 
sequence is generated by swapping out a native codon with a synonymous codon. Synonymous 
codons are picked based on the codon usage profile for the model organism in which the sequence 
will be expressed. Please read the decoded article found here 
(https://www.idtdna.com/pages/education/decoded/article/benefits-of-codon-optimization) for 
more detail about the IDT CodonOpt tool. Codon optimization will mitigate most types of complexity 
features found in a sequence but there will be cases were even this approach is not productive. 
Coding regions that produce a peptide with a tandemly repeated residue are difficult to solve for 


because repetitive codons cannot be avoided. Complexities found outside of a coding sequence are 
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more difficult to mitigate since these regions often contain cis-acting elements and any regulatory 


information found in the primary DNA sequence will likely be lost. 


Discussion 

The use of synthetic DNA in synthetic biology along within other areas of biological research are 
allowing for the advancement of science by offering two advantages, 1) synthetic DNA genetic 
constructs can now be produced beyond a high throughput scale, factory scale synthesis and 2) 
synthetic DNA allows for the generation of DNA sequences not found in nature, allowing for the 
investigation and development of novel technologies. Factory scales synthesis of DNA is facilitating 
the development of new therapeutics that address inherited and infectious disease along with 
developing new solutions for problems associated with environmental and biological sustainability. 
The key to using synthetic DNA is understanding the nature of the concept from a systems 
perspective to take full advantage of the technology. The molecular biology tool kit developed across 
the last 70 years has been used to advance new synthesis platforms and is used by researchers to 
manipulate DNA for an intended application. The systems-based approach described here focuses 
on the use of these tools from a synthetic DNA context. Molecular biology protocols written for 
traditional molecular biology that uses DNA template material derived from living organisms need to 
be re-evaluated to provide researchers guidance for using synthetic DNA templates. Providing 
researchers information about how a synthetic DNA construct is developed allows researchers to 
fully realize that whole steps in molecular biology workflows can be eliminated, accelerating the rate 
of discovery. Researchers are now able to place an order for a synthetic DNA construct over the 
internet and a short time later receive an envelope in the mail with DNA ready for an intended 
application. Understanding how synthetic DNA is generated also allows researchers to realize the 
limitation of the technology and provides an understanding of solutions that can help mitigate 
limitations. Codon optimization techniques are allowing researchers to design sequences that code 
for proteins found in nature using novel synonymous DNA sequences. Modular DNA assembly 
techniques, like Golden Gate assembly, are allowing researchers to produce DNA constructs with 


high degrees of sequence complexity. 
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Modularity in DNA assembly is not a new concept, techniques like GoldenBraid 2.0 and MoClo use a 
modular approach that allows researchers to develop a transcriptionally functional genetic construct 
using ready made “parts” or parts that are produced by PCR from a genomic template. Modules 
within these systems are based on cis-acting regulatory elements and modules produces with 
terminal adapter sequences with terminal RE sites for use in a particular cloning method. This 
method of modularization is now obsolete with recent advancements in synthetic DNA production. 
Modularity in a synthetic DNA assembly systems can now be exclusively based on the 
manufacturability of a DNA fragment. IDT essentially has three levels of modularity starting with an 
Ultramer oligonucleotide as the most fundamental unit. Ultramer oligos can be duplexed into dsDNA 
and used in assembly techniques like Golden Gate assembly to generate a DNA construct with high 
levels of complexity. Modules made as gBlocks and Custom Genes can be designed as assembly 
modules for either Golden Gate or Gibson assembly to generate larger constructs like Bacterial 
Artificial Chromosomes or even fully synthetic microbial genomes. New methods and molecular 
tools focused around synthetic DNA need to be developed to introduce a higher degree of structure 
to techniques to facilitate large and complex assemblies. Golden Braid 2.0 is an excellent example 
to work from that employs the use of tiered based cloning techniques. GoldenBraid vectors are 
multipartite, meaning that there are initial entry vectors for producing a partially assembled 
construct and destination vectors made to absorbed modules originally harbored in an entry vector. 
In other words, entry vectors with partial constructs can then be used as donor plasmids for 
assembling a final functional construct. Tiers between GoldenBraid vectors are based on their 
intended use with a specific type IIS restriction enzyme. Entry and final vectors are designed for use 
with distinct TIIS REs and modules are designed with all TIIS RE sites for the multi-tiered assembly 
process. For example, an entry vector will start with a primary RE like Bsal as the intended TIIS RE 
site. Cloning modules are then designed to have RE sites for an initial digestion with the primary RE 
into the entry vector. Secondary sites (pre-designed in the original synthetic DNA fragment), like 
BbsI are then used for digesting the assembled entry vector to excise modules that will be 
assembled into a destination vector meant for use with BbsI. Since synthetic DNA can be 
synthesized with virtually any DNA sequence, the facility of designing a module cannot be 


overstated. 
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